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Abstract 

The system-level dynamics of multivalent biomolecular interactions can be simulated 
using a rule-based kinetic Monte Carlo method in which a rejection sampling strategy 
is used to generate reaction events. This method becomes inefficient when simulating 
aggregation processes with large biomolecular complexes. Here, we present a rejection- 
free method for determining the kinetics of multivalent biomolecular interactions, and 
we apply the method to simulate simple models for ligand-receptor interactions. Sim- 
ulation results show that performance of the rejection-free method is equal to or better 
than that of the rejection method over wide parameter ranges, and the rejection-free 
method is more efficient for simulating systems in which aggregation is extensive. The 
rejection-free method reported here should be useful for simulating a variety of systems 
in which multisite molecular interactions yield large molecular aggregates. 
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1. Introduction 

Protein-protein interactions in signal transduction involve domain-based pro- 
tein interactions and site-specific modifications [H, 0| . Simulating the dynamics 
of a complex signaling system that has many protein interactions is usually 
a daunting task because a large (bio)chemical reaction network is typically re- 
quired to model interactions at the level of site-specific details and submolecular 
domains 0, 0, S Q • Even though a large-scale biochemical reaction network can 
be built by either manual or automated construction 0, H, H, [l3| , simulating 
such models is computationally inefficient because a conventional kinetic Monte 
Carlo simulation algorithm, for example, has a cost that depends on the size of 
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Figure 1: Diagrammatic depiction of a biochemical system described by rules and its under- 
lying reaction network. Rules partition the entire reaction list into disjoint subsets which are 
consolidated by rules into rate processes denoted by {vi,V2, ■■■,Vm}- The term Tij denotes 
the jth reaction implied by Rule i. Note that partitions of a reaction network specified by 
rules are not necessarily of finite size, which is illustrated in our model of multivalent ligand- 
receptor interactions (see main text). Rule-based kinetic Monte Carlo samples the rule list 
(the intermediate layer) to generate reaction events and avoids simulating a system by means 
of a reaction network (the bottom layer) . 



a network measured by the number of reactions 11] or the number of chemical 
species 



1^. 



The challenge of simulating protein-protein interactions in signal transduc- 
tion can be addressed with a rule-based modeling paradigm (see Ref. [4] for a 
review). Rule-based modeling provides a hierarchical structure to define bio- 
chemical reaction systems (Fig. [T]). In a rule-based approach, molecules are 
modeled as structured objects coniposed of reactive sites, and reaction rules are 
used to represent interactions i, [H m (see Fig. El for examples of rules for 
ligand-receptor interactions) . In general, a rule specifies local properties of indi- 
vidual sites (e.g., whether a site is free or occupied) in a molecule and application 
conditions that require checking non-local information (e.g., whether two sites 
are members of the same macromolecular aggregate). Assuming rate laws for 
elementary reactions, one parameterizes the reaction classes implied by a rule 
with a specific rate constant. Thus, a rule provides a compact representation of 
these reactions based on the law of mass action. 

Kinetic Monte Carlo (KMC) methods have been developed for simulating 
the stochastic dynamics of rule-based models 1^ 16, 17|. The methods of Danos 



et al. [15| and Yang et al. [16| avoid the requirement of specifying a chemical 



reaction network prior to simulation by directly sampling a rule list to generate 
reaction events and updating the system state in accordance with reaction rules. 
The computational cost is essentially independent of network size because a rule 
list is typically more compact and orders of magnitude smaller than the size of 
the corresponding reaction network. 

The rule-based KMC simulation method of Yang et al. [3l involves rejection 
sampling. After a rule is selected, trial sites with suitable local properties are 
randomly sampled to potentially undergo the chemical transformation(s) spec- 
ified by the rule. Trial sites are rejected if they are not compatible with the ap- 
plication conditions of the rule. Random site selection in the rejection sampling 



steps is a constant-time operation, but efficiency might be significantly com- 
promised in cases wliere null events become a dominant fraction of all sampled 
events. This scenario happens when the method is used to simulate multivalent 
ligand-receptor interactions in a sol-gel region that yields a giant ligand-receptor 
aggregate at equilibrium, where sampled trial sites are very likely rejected be- 
cause the vast majority of sites reside in the giant aggregate and are prohibited 
from binding to each other by an application condition (a model assumption) 
that prohibits intra- aggregate binding reactions to form cyclic aggregates. 

In this work, we present a rejection-free KMC method for simulating the 
kinetics of reactions implied by rules. This approach can be used to efficiently 
simulate processes involving multivalent biomolecule clustering and aggregation, 
processes often encountered in biochemical systems. The method samples sites 
for reaction according to their exact probability of taking part in a reaction de- 
fined by a sampled rule and consequently avoids rejecting trial sites altogether. 
The simulation is statistically exact and is validated by comparing to continuum 
solutions of low-dimensional models for multivalent ligand-receptor binding sys- 
tem. We simulate a trivalent ligand - bivalent receptor model using the method 
of Yang et al. [lB| and the rejection- free method. The latter method is found to 
have the same or better efficiency for most parameter values. 

2. Rule-based kinetic Monte Carlo 

In this section, wc describe the strategy of rejection-free KMC for rule-based 
models. In a biomolecular reaction system, particularly in a system involving 
protein-protein interactions for signal transduction, proteins reversibly bind to 
one another via their binding sites (conserved domains). Protein sites (amino 
acid residues) can also be modified, by phosphorylation, methylation, etc. The 
state of the system (or a system configuration) is determined by states of individ- 
ual sites and their connectivities. The temporal dynamics of a biochemical reac- 
tion system is the evolution of the system state in time, which can be modeled 
as a Markov process that is generally described by the Chapman-Kolmo goro v 
equation (or more specifically, the chemical master equation for our case) [18|: 




P(y, t) is the probability that the system is found in state Y, and VF(y jF) gives 
the transition rate from state Y to state Y' . In a conventional chemical reaction 
system, a state Y is defined by the concentrations of all chemical species. In a 
rule-based system, a state Y is defined by the site occupancy and connectivity 
of individual molecules. Analytical solutions to the above master equation are 
only possible for very simple systems. Direct numerical integration of the above 
ordinary differential equations is also intractable because the size of the state 
space is often enormous even for a mildly-complex system that involves interac- 
tions among a few protein types. To solve this problem, Monte Carlo simulation 
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is often applied to conduct random walks over the state space and to generate 
stochastic state trajectories for a system of interest. 

Here we consider a biochemical system that is described by a set of reaction 
rules (see Fig. [T]), R = {Ri,...,Rm}. The system evolves in time with rates 
r]= {rji, ...,ryM} for the rules. Such rate processes can be simulated using the 
standard kinetic Monte Carlo method [l9[. The waiting time r between two 
consecutive (reaction) events is taken to follow an exponential distribution 

P{r) = ?7tote-'"-" , (2) 

where ?7tot = X]f=i Vi is the sum of the rule rates. The mean waiting time for 
a reaction event is the inverse of ?7tot, (''') — For each reaction event, a 

waiting time r is sampled using the following equation: 

In(pi) 

T = 

Vtot 

where pi is a uniform random number in (0, 1). The probability for a rule to 
be chosen is proportional to its rate. Therefore, a specific rule e is sampled by 
finding the integer e that satisfies: 

e— 1 e 

P27?tot <^Vt , 
i=l i=l 

where p2 is a uniform random number in (0, 1). 

The rule-based kinetic Monte Carlo algorithm requires several additional 
operations per reaction event following the selections of t and e: (1) select 
reactive sites admissible to the rule; (2) excute the chemical transformation 
defined by rule e; and (3) update components in the rate vector r) that are 
affected by the chemical transformation. 

To choose reactive sites, we can write the rule-specific probability for a site 
a; in a candidate set to be chosen as (we note that a "site" x denotes a 
set of interacting protein sites that are defined by the rule. For example, for a 
bimolecular interaction, x will include two interacting sites) 

P{X) ^ Pa\s{x)Ps{x) . (5) 

This equation describes a two-step procedure to find a site x: (a) choose a 
site X according to the sampling probability Ps{x) and (b) accept the chosen 
X according to the acceptance probability P^\g{x). In the rejection method, 
a convenient sampling distribution Ps{x) of candidate sites, usually a uniform 
distribution for maximum sampling efficiency, is used to generate trial sites and 
trial sites may be rejected according to the acceptance probability Pa\s{x) so 
that the true probability P{x) is recovered. The rejection sampling introduces a 
"null" process by sampling nonreactive sites such that this null process advances 
time but does not change the system state. Therefore, we can partition the rate 
of a rule into two components: 

rje = rje.a + Ve.n e=l,...,M, (6) 
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Figure 2: The interactions of a trivalent ligand and a bivalent cell-surface receptor (left). 
Graphical rules (right) that represent free ligand recruitment to the cell surface (Rule 1), re- 
ceptor crosslinking by ligand (Rule 2) and ligand-receptor bond dissociation (Rule 3). Rules 
are parameterized by rate constants fc+i, fc+2 and fcoffi respectively. An empty (filled) circle 
inside a container denotes a free (bound) site. A connecting line indicates a ligand-receptor 
bond. An empty container indicates an arbitrary state (free or bound). An additional as- 
sumption is imposed to prohibit reactions defined by R2 from forming cyclic ligand-receptor 
aggregates (see main text). 



where rje^a and rje^n denote the actual reactive rate and the null process rate 
of rule e, respectively. Generally, the size of the null fraction of rj^ n depends 
on the site sampling procedure which also affects the calculation of the rule 
rate. Operating with the combined rate rje ignores any application condition 
by model assumption imposed on this rule in the step of site sampling. The 
model assumption can however be enforced by a subsequent rejection of the 
sampled sites that are non-permissible to react. Because the rejection method 
often can achieve random sampling of reactive sites, the cost per sampling step 
is constant and the implementation is usually straightforward. Determining 
whether to accept trial sites for reaction may however incur non-constant cost 
and complex bookkeeping, which depends on the nature of model assumptions. 

The effect of rejection sampling on computational performance can be mea- 
sured by the rejection ratio: 

9 = , (7) 

??tot 

where Tytot.n = "^fLi Vi,n is the total rate of null processes. For some systems 
where 9 may be close to 1, a majority of trial sites will be rejected and the algo- 
rithm becomes inefficient. For such situations, an algorithm without rejections 
(or with a substantially reduced number of rejections) is desirable. 

In contrast, a rejection- free method always accepts sampled site x. In other 
words, Pa\s{x) is unity for every x and P{x) is captured at the step of site 
sampling, i.e., P{x) = Ps{x). 

An early rejection-free Monte Carlo simulation method was developed by 
Bortz et al. [20] to simulate Ising spin systems, which overcomes the signif- 
icant slowing down near the critical temperature by the classic Metropolis 



method [2l[. The algorithm is often called the BLK algorithm, or the n-fold 



way method, and it has been widely used to simulate systems with high density 
and/or at low temperature. Gillespie later developed a stochastic simulation 
algorithm, equivalent to the method of Bortz et al. [2d| , for simulating coupled 
chemical reactions in a homogeneous reaction compartment j22l |. Below, we 
describe a rejection- free procedure for simulating rule-based models. 
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Without loss of generality, we only consider unimolecular and bimolecular 
reactions. Higher-order reactions can usually be decomposed into these two 
types of elementary reactions. For a unimolecular reaction (e.g., a single-step 
site modification or a noncovalent bond dissociation between two proteins) of 
rule e that involves a single type of protein site A, at a given time, the candidate 
sites are defined in a set Xe — {ai\i — l,...,na} that includes all legitimate 
sites designated by the rule and its application conditions. In many cases, by 
the mean-field approximation, sites available according to a rule definition are 
considered to have equal probability to be chosen. The probability of a site 
X — ai to be chosen is 



where Wi is the reaction propensity for site a.; and Z is a partition function Z = 
Ve = J2x "^i- general Wi reflects the biophysical property that determines 
the reaction propensity of the site ai. In most applications, a rule usually defines 
reactions of sites from the same protein types and with identical properties. For 
this reason and under the law of mass action, we can interpret Wi as the specific 
rate (or, the rate constant) for the rule. The partition funtion becomes the 
total rule rate, Z — fce|Xe|, where \Xf,\ denotes the size of the candidate set 
Xf, (i.e., \Xf,\ = Ua). Effectively, Eq. [5] defines a uniform sampling distribution 
over Xf^ . The above analysis can be readily extended to the case of bimolecular 
reactions. Considering an interaction such as binding between two types of 
protein sites A and B, we have a candidate set Xe = {ai,bj\i = 1, ...,no, j = 
1, nf,}. The probability of a pair of sites x = {ai, bj) to be chosen to react is 



Similarly, Wij can be interpreted as a specific rate ke that is identical for all 
site pairs. According to the law of mass action, the partition function Z = 

J2x, '^ij = ke\Xe\ = keUaTlh. 

Based on the above considerations, we summarize the rejection-free algo- 
rithm as follows. 

1. Initialization: assign copy numbers of proteins and specify initial states 
of individual protein sites, assign rate constants k for rules, and calculate 
the initial values of r). 

2. Sample a waiting time t and choose a reaction rule e according to Eqs. ([3]) 
and dH), respectively. 

3. Sample protein sites based on the distribution P{x), update states of the 
chosen sites according to the rule specification, and recalculate rate vector 
V- 

4. Repeat Steps 2 and 3. 

The above steps provide a general strategy for simulating rule-based bio- 
chemical reaction systems. In the following section, we apply the method to 
simulate a multivalent ligand-receptor interaction model and use the model to 
discuss the implementation details. 




(8) 



P{x 



{a„6j}) = Ps{x = {a^,bj}) = . 



(9) 
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3. Multivalent ligand-receptor binding systems 



Cell-surface receptor aggregation induced by ligand binding is an important 
step in many signal transduction pathways j23|. In an earlier study, Goldstein 
and Perelson developed an equilibrium model to investigate binding interactions 
between trivalent extracellular ligands and bivalent cell-surface receptors 
The Goldstein-Perelson model predicted in theory that for certain parameter 
values small receptor aggregates and "superaggregates" can coexist at equilib- 
rium. In fact, sol- gel phase transition phenomena can happen in any multivalent 
ligand-receptor system in which either the ligand or the receptor has more than 
two binding sites and each ligand and receptor has at least two binding sites. 

Recently, Yang et al. [3] developed a rule-based kinetic Monte Carlo method 
that was used to simulate the stochastic dynamics of a kinetic version of the 
Goldstein-Perelson model, which has been called the TLBR model, where TLBR 
represents "trivalent ligand - bivalent receptor." The gelation processes of ligand- 
receptor aggregation were observed in simulations, which was consistent with 
the Goldstein-Perelson theory of sol-gel phase transition. The simulation fur- 
ther revealed the kinetic details about formation of superaggregates. Monine 



et al. [25| also applied the same method to study steric effects using extended 



kinetic versions of the Goldstein-Perelson model. The KMC simulation method 
of Yang et al. [31 includes rejection sampling, which becomes inefficient when 
a system forms a superaggregate that contains most ligands and receptors. In 
this section, we evaluate the rejection- free method approach by applying it study 
multivalent ligand-receptor interaction systems. 



3.1. Model of ligand-receptor binding 

We consider a system with ligands and receptors. The ligand and 
receptor have vi and Vr symmetric binding sites, respectively. Three types of 
reactions are considered: (1) free ligand recruitment to a receptor on the cell 
surface, (2) crosslinking of two cell-surface receptors by a ligand that is already 
bound on the cell surface but has at least one free site, and (3) dissociation of a 
ligand-receptor bond. The rule-based specification of the TLBR model, a kinetic 
extension of the Goldstein-Perelson model, is illustrated in Fig. [51 The model 
has three reaction rules (Rules 1-3). To compare results with the Goldstein- 
Perelson model, we impose the same model assumptions: a) binding sites are 
equivalent [l^l, b) a ligand cannot associate with a receptor via more than one 
bond, and c) a ligand cannot associate with a receptor that is a member of the 
same aggregate (an aggregate is a ligand-receptor complex that has at least one 
ligand and one receptor in it). These assumptions prevent the formation of cyclic 
ligand-receptor aggregates and affect the calculation of rule rates, which is the 
actual reason that distinguishes rejection and rejection-free sampling prcedures. 

3.2. Calculation of rule rates and site sampling 

All rule rates are calculated according to the law of mass action. We first 
give equations for directly calculating rji and rj^. Calculation of ri2 requires 
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Figure 3: Validation of the simulation algorithm for multivalent ligand-receptor interaction 
systems, (a) Comparison of normalized results to the ODE solutions for the bivalent ligand 
and bivalent receptor interaction system. The normalization factor is Nn/n, where n is the 
number of receptors in an aggregate (n, = 2 for dimers, n = 3 for trimers, etc.). The means and 
standard deviations of the simulation results obtained by the simulation method are shown 
on top of the continuous ODE solutions. One stochastic time trajectory is shown for the free 
receptor population, (b) Stochastic receptor aggregate distribution for the trivalent ligand 
and bivalent receptor system. The system reaches equilibrium after 350 seconds and the 
averages of equilibrium distributions match the results (solid curve) obtained using the model 
of Goldstein and Perelson [IJ. Parameter values: Nr = 300, = 4200, fc+i = 6.67 X lO"'^ 
s~^, = lOOfc+i, fcofj = 0.01 s~^. Initial condition: all simulations start with free ligands 
and free receptors without bonds. 



non-trivial treatment, and different treatment implies different implementation 
of the algorithm. We will explain these issues in detail. 

The rate of free ligand recruitment to a cell-surface receptor for Rule 1, ryi, 
is proportional to the product of the number of free sites on free ligands and 
the number of free receptor sites on the cell surface, 

m = k+iViFLiVrNR - B) , (10) 

where is the number of free ligands. We note that Rule 1 specifies a class 
of bimolecular reactions and the rate constant k+i absorbs a volume factor 
(this also applies to the rate constant k+2)- The rate of ligand-receptor bond 
dissociation for Rule 3, 7^3, is proportional to the number of ligand-receptor 
bonds, 

m = kosB . (11) 

The number of bonds B increases (decreases) by 1 upon an association (disso- 
ciation) event. 

For ligand-mediated receptor cross-linking on the cell surface (an event of 
Rule 2), to avoid forming cyclic ligand-receptor aggregates, one must ensure 
that an association event joins either two separate ligand-receptor aggregates or 
a ligand-receptor aggregate and a free receptor. We note that the probability for 
a ligand site to be chosen is proportional to the number of free receptor sites to 
which the ligand site may bind, i.e., the number of free receptor sites excluding 
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ones in the same aggregate with the candidate ligand site. Once a hgand site is 
chosen, the probabihty of selecting a binding receptor site is uniform for ah free 
receptor sites that are not in the same aggregate with the selected ligand site. 

A straightforward approach is to calculate the probability for each free re- 
ceptor site to take part in the next event and the probability for each free ligand 
site (on the cell surface) to crosslink the free receptor site. However, the cost of 
searching over sites and updating site probabilities would scale with the number 
of molecules per Monte Carlo step. 

At a given time, if the system has Na ligand-receptor aggregates on the cell 
surface (excluding free receptors) , the rate 772 can be given by a direct sum over 
combinations between distinct aggregate pairs, i.e.. 



772 = k 



+2 



Na Na 



(12) 



where h and are the numbers of free ligand and receptor sites in the ith 
aggregate, respectively, and Fr is the number of free receptors. The term 
VrF]i{vi{NL — Fl) — Nb) accounts for the interactions between free ligand sites 
on the cell surface and free receptor sites. However, the computational cost of 
Eq. (|12p is 0{N\), which is impractical when Na is large. Alternatively, rate 
r]2 can be calculated as 

Na 

m = k+2 UvvNr ^B-n) , (13) 

The term {vrNu — B — Vi) is the total number of free receptor sites that are not 
in aggregate i. From the above equation, it follows that the probability that an 
aggregate i contributes a ligand site for a Rule 2 event is 

„ k4.2hiVrNf/ — B — Tj) , , 

p_^^^^^_R i = \^...^Na. 14 

m 

We first sample this distribution to locate an aggregate i that provides a free 
ligand site. All free ligand sites in aggregate i are equivalent (according to 
model assumptions) and therefore each of them has an equal probability to be 
chosen. Then, a free receptor site that is not in aggregate i is chosen to bind 
with the sampled ligand site. Similarly, all such free receptor sites (either in 
another aggregate or in the free receptor pool) are equivalent and therefore 
can be randomly sampled. The number of free ligand and receptor sites, li 
and r^, can be calculated when the numbers of ligands and receptors in an 
aggregate are known. In an acyclic aggregate that has receptors and rt; 
ligands, the number of bonds is given as 5 = + rt; — 1, and the numbers of 
free receptor and ligand sites can be calculated as r = vini — b and I ~ VrUr — b, 
respectively. This corresponds to a graph-theoretic observation that a connected 
acyclic graph (equivalent to a tree) has a number of edges one less than the 



number of nodes [27|. A ligand-receptor aggregate can be represented as a 
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bipartite graph, in which receptors and ligands are nodes and hgand-receptor 
bonds are edges. The numbers of receptors and Hgands can be obtained by a 
linear-time traversal of the aggregate graph. A better, alternative approach is 
to maintain an aggregate list during simulation. This approach avoids routine 
graph traversal of aggregate graphs for each reaction event except for bond 
dissociation events, in which graph traversals are required to determine the 
resulting two separate aggregates. Direct calculation of 772 using Eq. ([13]) has a 
linear-time cost scaled by the number of aggregates Na- To achieve a constant- 
time cost per event, we update 772 iteratively by accounting for the change caused 
by a reaction event. Eq. ([T^ can be rewritten as follows: 

V2 = k+2(^[vi{NL-FL)-B]{vrNR-B)-Y,kr}j . (15) 

The term [vi {Nl ~Fl) — B] [vrNn — B) accounts for all pair combinations of free 
ligand sites (on the cell surface) and free receptor sites under the law of mass 
action. We note that, in a method using rejection sampling [l^, this quantity 
is used to calculate rate 772 and a trial pair of ligand and receptor sites can 
be randomly chosen but are subject to rejection later if they are found to be 
components of the same aggregate. For our rejection-free implementation, the 
exact rate is obtained by subtracting u = Uti, where u is the number of 

intra-aggregate site-pair combinations, to accommodate the model assumption 
that prohibits intra-aggregate binding reactions. The first term in Eq. (|15p 
can be easily recalculated with a constant cost after each reaction event. The 
second term can be updated iteratively by Am (Table [T]). Each reaction event 
changes the list of aggregates by creating a new one, or modifying or removing 
existing ones. For instance, a free ligand binding to a receptor may (if the 
receptor is also free) or may not (if the receptor is already bound to another 
ligand) create a new aggregate. However, most aggregates do not participate in 
a reaction event and thus remain unchanged. Therefore, updating Eq. (|15p is 
straightforward by accounting for sites on a small number of affected aggregates 
(usually one or two) and it has a constant-time cost. For example, a merge of 
two aggregates i and j creates one new aggregate k {Na decreases by 1). In 
this case, on the newly-formed aggregate, free ligand and receptor sites both 
decrease by 1 from the sums of those on the two merging aggregates, and it 
follows that Au — l^rk — h^i — Ijfj, where Ik = li + lj — 1 and rk = ri + rj — I. 
Formulas for all event types are given in Table [T] 

Our implementations of the rejection method of Yang et al. [16] and the 
rejection-free method reported here are available upon request. These imple- 
mentations simulate the TLBR model, as well as models for rTi-valent ligand 
and n-valent receptor that incorporate model assumptions similar to those of 
the Goldstein-Perelson model. The rejection and rejection-free codes were writ- 
ten with the intention of making them as much alike as possible, except for 
the sampling strategies and rule rate calculations, to eliminate irrelevant differ- 
ences. We note that our implementations of the simulation methods discussed 
here provide new tools for studying the equilibria and kinetics of multivalent 
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Table 1; Formulas for calculating Am2 



Rule Event Am2 

1 A free ligand binds to a free receptor (d,. — — 1) 
A free ligand binds to aggregate i (f; — l)(ri — 1) — 

2 Aggregate i associates with a free receptor (vr — — 1) — r; 
Aggregates i and j associate to form a new aggregate k Iki'k ~ hi'i — Ij^i 

3 An aggregate breaks into a free ligand and a free receptor — (fr — ^){t>i — 1) 
A ligand dissociates from aggregate i li — (vi — l){ri + 1) 
A receptor dissociates from aggregate i fi — {vr — + 1) 
Aggregate k dissociates into two aggregates i and j liVi + l^rj — Ij^r^ 



ligand-receptor interactions and gelation on cell membranes, topics which have 
been studied intensely over the years via a variety of approaches 2^, 2^ 3^, 31 1. 



4. Simulation results 

In this section, we present results obtained using the rejection- free algorithm 
described above. Parameter values used in simulations were taken from Ref. [l6l |. 

To validate the accuracy of our method, we compare simulation results with 
those obtained using ordinary differential equations (ODEs). The relaxation 
kinetics of a bivalent ligand {vi — 2) - bivalent receptor [vr — 2) system to 
equilibrium is shown in Fig. [3l^a). The deterministic results were obtained by 
solving a small number of ODEs as described in Refs. 2^ 32 1. The stochastic 



trajectories recapitulate on average the deterministic solutions. In Fig. Eljb), 
the equilibrium distribution of receptor aggregates in a TLBR system agrees on 
average with the equilibrium results from the Goldstein-Perelson model. These 
results demonstrate that the simulation algorithm produces outcomes consistent 
with those obtained independently. 

A phase transition in ligand-receptor clustering at equilibrium is predicted 
in the Goldstein-Perelson model by varying two dimensionless parameters Ctot = 
Zkj^iN L / kofi and /3 = which shows that finite-sized ligand-receptor 

aggregates can coexist with an infinite-sized polymer-like aggregate in a so- 
called sol-gel coexistence regime To investigate the stochastic dynamics of 
ligand-receptor aggregation and the algorithmic performance at different phase 
regimes, we use the average receptor aggregate size to measure the degree of 
ligand-receptor clustering, which can be calculated as 

Nr ^ Nr 

n=l ^ n=l 

where n is the size of a ligand-receptor aggregate measuring the number of 
receptors in the aggregate and x„ is the number of aggregates of size n. The 
term p„ = nXn/Nj^ is the fraction of receptors in aggregates of size n, which 
corresponds to the probability of finding an arbitrary receptor in an aggregate of 
size n. We note that in the above equation the number of aggregates containing 
single receptors, xi, accounts for all receptor monomers including free receptors. 
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Number of receptors 



Figure 4: The relationship between {Njn) and Nn for the trivalent Ugand-bivalent receptor 
system. The number of Ugands A'^^ is set equal to the number of receptors A^^. (N^) is 
obtained by calculating its mean value from 10^ consecutive reaction events after the sys- 
tem reaches equilibrium. Other parameters and initial conditions are identical to the values 
indicated in Fig. \3\ 

Therefore, at any given time the average aggregate size T takes a value in the 
interval [1, Nu]. In practice, since the number of aggregates Na is always much 
less than Nj^ (Fig.H]), F can be equivalently and more efficiently calculated with 
the following equation: 



where is the number of free receptors and rij is the number of receptors in 
aggregate j. The average aggregate size T can be also calculated iteratively dur- 
ing a simulation by accounting for the changes of aggregation caused by reaction 
events. To study the effect of the degree of aggregation on algorithmic efficiency 
by adjusting the parameters ctot and /3, we ran simulations of the TLBR model 
to equilibrium under a set of different values of the dissociation rate constant 
fcoff in the range between 10^^ s^^ and 100 s^^, with both Nl and Nji fixed at 
5000. Figure [Sfa) shows a sigmoid-like relationship between the mean average 
aggregate size (F) and the dissociation rate constant fcoff . At smaller fcoff (sol-gel 
region), (F) approaches its maximum value (close to the number of receptors 
Nji), which indicates that a large aggregate containing most receptors exists 
in the system. At larger kog (sol region), (F) approaches its minimum value, 
indicating that the majority of receptors are in the form of free receptors and 
ligand-bound receptor monomers. The mean aggregate number (Na) exhibits 
a bell-shaped curve with a maximum value of 966 near the phase transition 
boundary at kos = 0.17. Figure [S^b) shows the performance comparison be- 
tween the rejection-free method described in this work and the rejection method 
of Yang et al. 16] in terms of CPU time per reaction event, for the TLBR model 
with a typical set of parameters. In most of the sol region (/coff > 1 s^^) and 
near the sol-gel region (10~^ s~^ < fcoff < 10 s~^), the rejection-free and re- 
jection methods match each other in performance, and the efficiency of both 




(17) 
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Mean average aggregate size (<r>) 



Figure 5: Performance comparison of the rejection-free method and the rejection method, (a) 
The mean average aggregate size (F) (soUd hne) and the average number of aggregates (A'^^) 
(dashed line). The results are normalized by the total number of receptors, Nn. Simulation 
parameters; = 2,vi = 3, iV/j = N]^ = 5000. (b) Relationship between CPU time and 
^oS- rejection-free method (solid line), rejection method of Yang et al. [T3 with aggregate 
bookkeeping (dashed line), (c) The relationship between the rejection ratio 9 and the mean 
average aggregate size (F). CPU times per reaction event, (F) and (Nji) are calculated by 
averaging over 200,000 reaction events after simulations reach equilibrium. Other parameters 
and initial conditions are the same as indicated in Fig. [S] 
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Figure 6: (c) Scaling with number of binding sites on the Ugand (vi). For both the rejection- 
free (squares) and the rejection (circles) simulations, the receptor valence is fixed at Vr = 2 and 
the copy numbers of receptor and ligand are = A'^^ = 1000. CPU times per reaction event 
are calculated by averaging over 200,000 reaction events after simulations reach equilibrium. 
Other parameters and initial conditions are the same as indicated in Fig. [3l 



methods deteriorates as (F) increases. The rejection-free method is affected 
by searching over a near maximum number of aggregates, which is reflected in 
the increases of CPU time per reaction event near the boundary of the phase 
transition (10^^ s^^ < fcoff < 1 s~^) compared to that of the rejection method. 
However, the rejection-free method is less sensitive to (F) and outperforms the 
rejection method in the highly-aggregated sol-gel region (fcoff < lO""*), where 
the rejection method has a rejection ratio 9 greater than 99% at equilibrium. 
As shown in Fig. [SJc), the rejection ratio 9 has a near log- linear dependence on 
F over a wide range. 

To investigate the effect of valence, we varied the number of ligand binding 
sites VI and kept receptor valence fixed at = 2 (Fig. EJb)), which can be 
accomplished experimentally through synthesis of multivalent ligands 33, 13, 
list . Compared to the method of Yang et al. [l^ , the rejection- free method has 
better scaling of CPU time per reaction event with the increase of vi compared to 
the rejection method. Except for the system with bivalent ligands, the rejection- 
free method is almost insensitive to changes in the number of ligand binding 
sites, whereas sampling by the rejection method involves large numbers of null 
events due to increases in intra-aggregate combinations of available ligand and 
receptor site pairs. 



5. Discussion 

We have presented a rejection-free method for simulating biochemical re- 
action models specified by reaction rules. A kinetic Monte Carlo procedure is 
applied to sample the rule list, identify reactant proteins, and update protein 
states. In this procedure, all chemical species are formed dynamically. For this 
reason, our method has a computational cost independent of reaction network 
size. The implementation described here is a rejection-free method in which 
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every Monte Carlo step changes the state of the system. The method accounts 
for the exact rates of reaction rules and probability distributions of protein 
sites, including rules that require evaluation of non-local state information. In 
a more general and straightforward implementation, selecting candidate reac- 
tant sites based on these probability distributions incurs linear-time cost per 
reaction event scaled by the number of sites. In contrast, the rejection-free 
implementation reported here in fact scales with the number of existing chem- 
ical species. In the multivalent ligand-receptor interaction model, the number 
of chemical species during a simulation corresponds to the number of ligand- 
receptor aggregates, Na- Although the rejection- free method has a higher cost 
per Monte Carlo step because of searching candidate sites and extra bookkeep- 
ing required for calculating probabilities for sites, the method outperforms the 
rejection method when the rejection ratio 9 approaches unity. Our method is 
closely related to the direct simulation Monte Carlo (DSMC) method devel- 
oped to simulate coagulation processes where irreversible particle aggregation 
is considered j36j . 

A comparison between rejection-free and rejection methods was briefly dis- 
cussed by Yang et al. [l^. The rejection- free algorithm reported by Yang et 
al. [3| takes a strategy of searching for a ligand-receptor site pair for cross- 
linking, which is less efhcient for simulating formations of large ligand-receptor 
clusters in comparison to the algorithm presented in this report. This earlier 
algorithm maintains reaction probabilities of all sites for searching, which has 
a cost of CPU time per reaction event scaling with the number of molecules. 
In contrast, the current method searches over an aggregate list that is compact 
compared to the full list of molecule sites. For example, as shown in Fig. El^a), 
the maximum number of aggregates on average (observed around kos = 0.1 
s""'^) was less than one fourth of the total number of receptors {Nft, = 5000). At 
other values of /coff, {Na) is much less than Nn (Fig[5ja)). As shown in Fig. |4l 
at equilibrium {Na) has a moderate (sublinear) dependence on the number of 
molecules. 

Both rejection and rejection-free methods require explicitly tracking con- 
nectivity between sites. This feature erodes the efficiency of simulation. In 
simulating the multivalent ligand-receptor binding model, to process an aggre- 
gate dissociation, at least one unweighted traversal of an aggregate subgraph 
is necessary (with no prior information about which subaggregate is smaller). 
This presents a major bottleneck to simulating systems in the sol-gel regime 
because a graph traversal has an order of cost proportional to the size of the 
aggregate. In the sol-gel regime, most graph traversals will happen on the giant 
aggregate that has a size close to that of the entire system. Experiments can 
provide some information about the composition of a protein aggregate but the 
intra-aggregate (or intra-multiprotein complex) topology usually cannot be re- 
solved. To compare with data, it may be possible to simulate a system without 
tracking the aggregate topology if prediction of connectivity information (e.g., 
distribution of aggregate topologies for an aggregate size) is not of interest. 

In summary, our results suggest that a rejection- free kinetic Monte Carlo 
approach to simulation of rule-based models has appeal for simulating aggrega- 



15 



tion processes, which are common in many biochemical systems. Ahhough our 
implementation in this study is problem-specific for multivalent ligand-receptor 
interactions, our results suggest that a generalized implementation of rejection- 
free procedures along the lines presented here into existing rule-based modeling 
software may be beneficial. 
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